Mask-CNN: Localizing Parts and Selecting Descriptors for Fine-Grained Image Recognition

نویسندگان

  • Xiu-Shen Wei
  • Chen-Wei Xie
  • Jianxin Wu
چکیده

Fine-grained image recognition is a challenging computer vision problem, due to the small inter-class variations caused by highly similar subordinate categories, and the large intra-class variations in poses, scales and rotations. In this paper, we propose a novel end-to-end Mask-CNN model without the fully connected layers for fine-grained recognition. Based on the part annotations of fine-grained images, the proposed model consists of a fully convolutional network to both locate the discriminative parts (e.g., head and torso), and more importantly generate object/part masks for selecting useful and meaningful convolutional descriptors. After that, a four-stream Mask-CNN model is built for aggregating the selected objectand part-level descriptors simultaneously. The proposed Mask-CNN model has the smallest number of parameters, lowest feature dimensionality and highest recognition accuracy when compared with state-of-the-arts fine-grained approaches.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Where to Focus: Deep Attention-based Spatially Recurrent Bilinear Networks for Fine-Grained Visual Recognition

Fine-grained visual recognition typically depends on modeling subtle difference from object parts. However, these parts often exhibit dramatic visual variations such as occlusions, viewpoints, and spatial transformations, making it hard to detect. In this paper, we present a novel attention-based model to automatically, selectively and accurately focus on critical object regions with higher imp...

متن کامل

Object-centric Sampling for Fine-grained Image Classification

This paper proposes to go beyond the state-of-the-art deep convolutional neural network (CNN) by incorporating the information from object detection, focusing on dealing with fine-grained image classification. Unfortunately, CNN suffers from over-fiting when it is trained on existing finegrained image classification benchmarks, which typically only consist of less than a few tens of thousands t...

متن کامل

Weakly-supervised Discriminative Patch Learning via CNN for Fine-grained Recognition

Research on fine-grained recognition has recently shifted from multistage frameworks to convolutional neural networks (CNN) that are trained end-to-end. Many previous end-to-end deep approaches typically consist of a recognition network and an auxiliary localization network trained with additional part annotations to detect semantic parts shared across classes. To avoid the cost of extra semant...

متن کامل

Visual descriptors for content-based retrieval of remote sensing images

In this paper we present an extensive evaluation of visual descriptors for the content-based retrieval of remote sensing images. The evaluation includes global, local, and Convolutional Neural Network (CNNs) features coupled with three different Content-Based Image Retrieval schemas. We conducted all the experiments on two publicly available datasets: the 21class UC Merced Land Use/Land Cover d...

متن کامل

Fine-grained pose prediction, normalization, and recognition

Pose variation and subtle differences in appearance are key challenges to finegrained classification. While deep networks have markedly improved general recognition, many approaches to fine-grained recognition rely on anchoring networks to parts for better accuracy. Identifying parts to find correspondence discounts pose variation so that features can be tuned to appearance. To this end previou...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1605.06878  شماره 

صفحات  -

تاریخ انتشار 2016